{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. PP-TSM模型简介\n",
    "\n",
    "视频分类与图像分类相似，均属于识别任务，对于给定的输入视频，视频分类模型需要输出其预测的标签类别。如果标签都是行为类别，则该任务也常被称为行为识别。与图像分类不同的是，视频分类往往需要利用多帧图像之间的时序信息。PP-TSM是PaddleVideo自研的实用产业级视频分类模型，在实现前沿算法的基础上，考虑精度和速度的平衡，进行模型瘦身和精度优化，使其可能满足产业落地需求。\n",
    "\n",
    "PP-TSM基于ResNet-50骨干网络进行优化，从数据增强、网络结构微调、训练策略、BN层优化、预训练模型选择、模型蒸馏等6个方面进行模型调优。在基本不增加计算量的前提下，使用中心采样评估方式，PP-TSM在Kinetics-400上精度较原论文实现提升3.95个点，达到76.16%，超过同等骨干网络下的3D模型，且推理速度快4.5倍！\n",
    "\n",
    "更多关于PaddleVideo可以点击 https://github.com/PaddlePaddle/PaddleVideo 进行了解。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 模型效果及应用场景\n",
    "### 2.1 视频分类任务：\n",
    "\n",
    "#### 2.1.1 数据集：\n",
    "\n",
    "数据集以Kinetics-400为主，分为训练集和测试集。\n",
    "\n",
    "#### 2.1.2 模型效果速览：\n",
    "\n",
    "PP-TSM在视频上的预测效果为：\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/22365664/200510601-85b04d54-e892-43db-8a92-679f623bb6e4.gif\"  width = \"80%\"  />\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 模型如何使用\n",
    "\n",
    "### 3.1 模型推理：\n",
    "* 下载 \n",
    "\n",
    "（不在Jupyter Notebook上运行时需要将\"！\"或者\"%\"去掉。）\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!mkdir -p ~/work\n",
    "%cd ~/work\n",
    "# 克隆PaddleVideo（从gitee上更快）,本项目以做持久化处理，不用克隆了。\n",
    "!git clone https://github.com/PaddlePaddle/PaddleVideo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 安装"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 运行脚本需在PaddleVideo目录下\n",
    "%cd ~/work/PaddleVideo/\n",
    "\n",
    "# 安装所需依赖项【已经做持久化处理，无需再安装】\n",
    "!pip install --upgrade pip\n",
    "!pip install -r requirements.txt --user\n",
    "\n",
    "# 安装PaddleVideo\n",
    "!pip install ppvideo==2.3.0 --user"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 快速体验\n",
    "\n",
    "恭喜！ 您已经成功安装了PaddleVideo，接下来快速体验视频分类效果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!ppvideo --model_name='ppTSM' --use_gpu=False --video_file='data/example.avi'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上述代码会下载训练好的PP-TSM模型，基于CPU，对data/example.avi示例文件进行预测。\n",
    "\n",
    "输出日志结果如下：\n",
    "```txt\n",
    "Current video file: data/example.avi\n",
    "        top-1 classes: [5]\n",
    "        top-1 scores: [0.95056254]\n",
    "        top-1 label names: ['archery']\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 模型训练：\n",
    "* 克隆PaddleVideo仓库（详见3.1）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 准备数据集和预训练模型\n",
    "\n",
    "下面以Kinetics-400小数据集为例，演示模型训练过程。开发者也可以参考该数据格式，准备自己的训练数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 进入PaddleVideo工作目录\n",
    "%cd ~/work/PaddleVideo/\n",
    "\n",
    "# 下载Kinetics-400小数据集\n",
    "%pushd ./data/k400\n",
    "!wget -nc https://videotag.bj.bcebos.com/Data/k400_videos_small.tar\n",
    "!tar -xf k400_videos_small.tar\n",
    "%popd\n",
    "\n",
    "# 下载预训练模型\n",
    "!wget -nc -P ./data https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_vd_ssld_v2_pretrained.pdparams --no-check-certificate\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 修改yaml配置文件\n",
    "\n",
    "\n",
    "修改配置文件` configs/recognition/pptsm/pptsm_k400_videos_uniform.yaml`中的数据标注文件路径\n",
    "\n",
    "```\n",
    "DATASET: #DATASET field\n",
    "    batch_size: 4   # 根据GPU显存适当修改\n",
    "    num_workers: 4 \n",
    "    test_batch_size: 1\n",
    "    train:\n",
    "        format: \"VideoDataset\" \n",
    "        data_prefix: \"data/k400/videos\" \n",
    "        file_path: \"data/k400/train_small_videos.list \" #修改训练集路径\n",
    "    valid:\n",
    "        format: \"VideoDataset\" \n",
    "        data_prefix: \"data/k400/videos\" \n",
    "        file_path: \"data/k400/val_small_videos.list\" #修改验证集路径\n",
    "    test:\n",
    "        format: \"VideoDataset\"\n",
    "        data_prefix: \"data/k400/videos\" \n",
    "        file_path: \"data/k400/val_small_videos.list\" #修改验证集路径\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 训练模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%cd ~/work/PaddleVideo/\n",
    "%env CUDA_VISIBLE_DEVICES=0\n",
    "#开始训练\n",
    "!python main.py --validate -c configs/recognition/pptsm/pptsm_k400_videos_uniform.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 模型评估\n",
    "\n",
    "在训练时设置`--validate`，可以在训练时同步进行评估。对于训练好的模型，也可以使用如下命令进行评估。通过`-c`参数指定配置文件，`-w`参数指定待评估的模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%cd ~/work/PaddleVideo/\n",
    "%env CUDA_VISIBLE_DEVICES=0\n",
    "\n",
    "#训练完以后，进行评估\n",
    "!python main.py --test -c configs/recognition/pptsm/pptsm_k400_videos_uniform.yaml -w output/ppTSM/ppTSM_best.pdparams"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 模型原理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 采用 Temporal Shift Module（时序位移模块)\n",
    "\n",
    "PP-TSM 使用时序位移模块提取时序特征。通过通道移动的方法，在不增加任何额外参数量和计算量的情况下，极大地提升了模型对于视频时间信息的利用能力。\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleVideo/develop/docs/images/tsm_architecture.png\" height=250 width=700 hspace='10'/> <br />\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 数据增强 VideoMix\n",
    "\n",
    "对于视频Mix-up，PP-TSM将两个视频以一定的权值叠加构成新的输入视频，提升网络在时空上的抗干扰能力。\n",
    "\n",
    "* 精确BN precise BN\n",
    "\n",
    "为了获取更加精确的均值和方差供BN层在测试时使用，在实验中，我们会在网络训练完一个Epoch后，固定住网络中的参数不动，然后将训练数据输入网络做前向计算，保存下来每个step的均值和方差，最终得到所有训练样本精确的均值和方差，提升测试精度。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. 注意事项\n",
    "\n",
    "PP-TSM模型提供的各配置文件均放置在configs/recognition/pptsm目录下，配置文件名按如下格式组织:\n",
    "```模型名称_骨干网络名称_数据集名称_数据格式_测试方式_其它.yaml```\n",
    "\n",
    "* 数据格式包括frame和video，video表示使用在线解码的方式进行训练，frame表示先将视频解码成图像帧存储起来，训练时直接读取图片进行训练。使用不同数据格式，仅需修改配置文件中的DATASET和PIPELINE字段，参考pptsm_k400_frames_uniform.yaml和pptsm_k400_videos_uniform.yaml。注意，由于编解码的细微差异，两种格式训练得到的模型在精度上可能会有些许差异。\n",
    "\n",
    "* 测试方式包括uniform和dense，uniform表示中心采样，dense表示密集采样。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. 相关论文以及引用信息\n",
    "\n",
    "```\n",
    "@inproceedings{lin2019tsm,\n",
    "  title={TSM: Temporal Shift Module for Efficient Video Understanding},\n",
    "  author={Lin, Ji and Gan, Chuang and Han, Song},\n",
    "  booktitle={Proceedings of the IEEE International Conference on Computer Vision},\n",
    "  year={2019}\n",
    "} \n",
    "```\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
